Graphs are a fantastic method to communicate quantitative results. Compelling and effective graphs are clean, clear and emphasize data above “empty” graphic design. As Edward Tufte (one of the pioneers of data visualisation) wrote, “Confusion and clutter are failures of design, not attributes of information” (E. R. Tufte, Goeler, and Benson 1990, 53). Good data visualisation places data at the forefront, and is grounded in a strong knowledge of statistics.

USSC graphs

A standard USSC graph (for R users, theme_ussc()1) has the following elements:

The main content width for the USSC website is 780 pixels (px) wide, which translates to 8.125 inches (in) or 20.636 centimetres (cm). When saving a publication-ready graph, set the width to 8 inches (20 cm or 780px). Most graphs fit in this div. For wide graphs that spans the entire width of the webpage, set the width as 1320 px (13.75 in or 34.925 cm). If the graphs are web-based, the height can vary.

To avoid compression issues that lead to fuzzy fonts and lines, I set the width to be slightly less than the allowed maximum (i.e. 1300 px instead of 1320 px, or 34 cm instead of 35 cm). Play around with different file types (i.e. SVG or PDF instead of PNG). Avoid JPEG files. As a last resort, consider removing titles and captions from the file and adding them into the website manually.2

USSC colour schemes

In statistical graphics, there are three kind of palettes: qualitative, sequential and diverging. The first is used for coding categorical information and the latter two are for numerical or ordinal variables.

Diverging palettes are often used to emphasize the midpoint. The midpoint must be “significant” or worthy of highlighting, for e.g. exam results: if a result >50% is a pass, then <50% must be a failing grade. You should plot exam results using a diverging scale because straight away you can tell who failed or passed. If you plot the same results using a sequential scale, this fact might not be as obvious.

There are two sequential palettes below: blue and greyscale. There is one categorical scale up top. The remainder are diverging scales.

If you have categorical data with more than 6 categories, collapse them! The greater the number of categories, the harder it is to compare them. The folks over at the Urban Institute agree with this advice.

#009DE3
rgb(0, 157, 227)
hsl(199, 100%, 45%)
#1C396E
rgb(28, 57, 110)
hsl(219, 59%, 27%)
#ED1B35
rib(237, 27, 53)
hsl(353, 85%, 52%)
#CCCCCC
rgb(204, 204, 204)
hsl(0, 0%, 80%)
#8C8C8C
rgb(140, 140, 140)
hsl(0, 0%, 55%)
#000000
rgb(0, 0, 0)
hsl(0, 0%, 0%)
Two colours

Light colour palette
scale_colour_ussc('light')
scale_fill_ussc('light')

#009DE3
rgb(0, 157, 227)
hsl(199, 100%, 45%)
#ED1B35
rgb(237, 27, 53)
hsl(353, 85%, 52%)

Dark colour palette
scale_colour_ussc('dark')
scale_fill_ussc('dark')

#1C396E
rgb(28, 57, 110)
hsl(219, 59%, 27%)
#ED1B35
rgb(237, 27, 53)
hsl(353, 85%, 52%)

Blue colour palette
scale_colour_ussc('blue')
scale_fill_ussc('blue')

#009DE3
rgb(0, 157, 227)
hsl(199, 100%, 45%)
#1C396E
rgb(28, 57, 110)
hsl(219, 59%, 27%)

Greyscale colour palette
scale_colour_ussc('grey')
scale_fill_ussc('grey')

#CCCCCC
rgb(204, 204, 204)
hsl(0, 0%, 80%)
#000000
rgb(0, 0, 0)
hsl(0, 0%, 0%)
Three colours

Main colour palette
scale_colour_ussc('main')
scale_fill_ussc('main')

#1C396E
rgb(28, 57, 110)
hsl(219, 59%, 27%)
#009DE3
rgb(0, 157, 227)
hsl(199, 100%, 45%)
#ED1B35
rgb(237, 27, 53)
hsl(353, 85%, 52%)

Light colour palette
scale_colour_ussc('light')
scale_fill_ussc('light')

#009DE3
rgb(0, 157, 227)
hsl(199, 100%, 45%)
#765C8C
rgb(118, 92, 140)
hsl(273, 21%, 45%)
#ED1B35
rgb(237, 27, 53)
hsl(353, 85%, 52%)

Dark colour palette
scale_colour_ussc('dark')
scale_fill_ussc('dark')

#1C396E
rgb(28, 57, 110)
hsl(219, 59%, 27%)
#842A51
rgb(132, 42, 81)
hsl(334, 52%, 34%)
#ED1B35
rgb(237, 27, 53)
hsl(353, 85%, 52%)

Blue colour palette
scale_colour_ussc('blue')
scale_fill_ussc('blue')

#009DE3
rgb(0, 157, 227)
hsl(199, 100%, 45%)
#0E6BA8
rgb(14, 107, 168)
hsl(204, 85%, 36%)
#1C396E
rgb(28, 57, 110)
hsl(219, 59%, 27%)

Greyscale colour palette
scale_colour_ussc('grey')
scale_fill_ussc('grey')

#CCCCCC
rgb(204, 204, 204)
hsl(0, 0%, 80%)
#8C8C8C
rgb(140, 140, 140)
hsl(0, 0%, 55%)
#000000
rgb(0, 0, 0)
hsl(0, 0%, 0%)
Four colours

Main colour palette
scale_colour_ussc('main')
scale_fill_ussc('main')

#1C396E
rgb(28, 57, 110)
hsl(219, 59%, 27%)
#097BBB
rgb(9, 123, 187)
hsl(202, 91%, 38%)
#4E71A9
rgb(78, 113, 169)
hsl(217, 37%, 48%)
#ED1B35
rgb(237, 27, 53)
hsl(353, 85%, 52%)

Light colour palette
scale_colour_ussc('light')
scale_fill_ussc('light')

#009DE3
rgb(0, 157, 227)
hsl(199, 100%, 45%)
#4F71A9
rgb(79, 113, 169)
hsl(217, 36%, 49%)
#9E466F
rgb(158, 70, 111)
hsl(332, 39%, 45%)
#ED1B35
rgb(237, 27, 53)
hsl(353, 85%, 52%)

Dark colour palette
scale_colour_ussc('dark')
scale_fill_ussc('dark')

#1C396E
rgb(28, 57, 110)
hsl(219, 59%, 27%)
#612f5b
rgb(97, 47, 91)
hsl(307, 35%, 28%)
#a72548
rgb(167, 37, 72)
hsl(344, 64%, 40%)
#ED1B35
rgb(237, 27, 53)
hsl(353, 85%, 52%)

Blue colour palette
scale_colour_ussc('blue')
scale_fill_ussc('blue')

#009DE3
rgb(0, 157, 227)
hsl(199, 100%, 45%)
#097bbc
rgb(9, 123, 188)
hsl(202, 91%, 39%)
#125a95
rgb(18, 90, 149)
hsl(207, 78%, 33%)
#1C396E
rgb(28, 57, 110)
hsl(219, 59%, 27%)

Greyscale colour palette
scale_colour_ussc('grey')
scale_fill_ussc('grey')

#CCCCCC
rgb(204, 204, 204)
hsl(0, 0%, 80%)
#A1A1A1
rgb(161, 161, 161)
hsl(0, 0%, 63%)
#5D5D5D
rgb(93, 93, 93)
hsl(0, 0%, 36%)
#000000
rgb(0, 0, 0)
hsl(0, 0%, 0%)
Five colours

Main colour palette
scale_colour_ussc('main')
scale_fill_ussc('main')

#1C396E
rgb(28, 57, 110)
hsl(219, 59%, 27%)
#0E6BA8
rgb(14, 107, 168)
hsl(204, 85%, 36%)
#009DE3
rgb(0, 157, 227)
hsl(199, 100%, 45%)
#765C8C
rgb(118, 92, 140)
hsl(273, 21%, 45%)
#ED1B35
rgb(237, 27, 53)
hsl(353, 85%, 52%)

Light colour palette
scale_colour_ussc('light')
scale_fill_ussc('light')

#009DE3
rgb(0, 157, 227)
hsl(199, 100%, 45%)
#3B7CB7
rgb(59, 124, 183)
hsl(209, 51%, 47%)
#765C8C
rgb(118, 92, 140)
hsl(273, 21%, 45%)
#B13B60
rgb(177, 59, 96)
hsl(341, 50%, 46%)
#ED1B35
rgb(237, 27, 53)
hsl(353, 85%, 52%)

Dark colour palette
scale_colour_ussc('dark')
scale_fill_ussc('dark')

#1C396E
rgb(28, 57, 110)
hsl(219, 59%, 27%)
#50315F
rgb(80, 49, 95)
hsl(280, 32%, 28%)
#842A51
rgb(132, 42, 81)
hsl(334, 52%, 34%)
#B82243
rgb(184, 34, 67)
hsl(347, 69%, 43%)
#ED1B35
rgb(237, 27, 53)
hsl(353, 85%, 52%)

Blue colour palette
scale_colour_ussc('blue')
scale_fill_ussc('blue')

#009DE3
rgb(0, 157, 227)
hsl(199, 100%, 45%)
#0784C5
rgb(7, 132, 197)
hsl(201, 93%, 40%)
#0E6BA8
rgb(14, 107, 168)
hsl(204, 85%, 36%)
#15518B
rgb(21, 81, 139)
hsl(209, 74%, 31%)
#1C396E
rgb(28, 57, 110)
hsl(219, 59%, 27%)

Greyscale colour palette
scale_colour_ussc('grey')
scale_fill_ussc('grey')

#CCCCCC
rgb(204, 204, 204)
hsl(0, 0%, 80%)
#ACACAC
rgb(172, 172, 172)
hsl(0, 0%, 67%)
#8C8C8C
rgb(140, 140, 140)
hsl(0, 0%, 55%)
#464646
rgb(70, 70, 70)
hsl(0, 0%, 27%)
#000000
rgb(0, 0, 0)
hsl(0, 0%, 0%)
Six colours

Main colour palette
scale_colour_ussc('main')
scale_fill_ussc('main')

#1C396E
rgb(28, 57, 110)
hsl(219, 59%, 27%)
#10619C
rgb(16, 97, 156)
hsl(205, 81%, 34%)
#0589CB
rgb(5, 137, 203)
hsl(200, 95%, 41%)
#2F83C0
rgb(47, 131, 192)
hsl(205, 61%, 47%)
#8E4E7A
rgb(142, 78, 122)
hsl(319, 29%, 43%)
#ED1B35
rgb(237, 27, 53)
hsl(353, 85%, 52%)

Light colour palette
scale_colour_ussc('light')
scale_fill_ussc('light')

#009DE3
rgb(0, 157, 227)
hsl(199, 100%, 45%)
#2F83C0
rgb(47, 131, 192)
hsl(205, 61%, 47%)
#5E699D
rgb(94, 105, 157)
hsl(230, 25%, 49%)
#8E4E7A
rgb(142, 78, 122)
hsl(319, 29%, 43%)
#BD3457
rgb(189, 52, 87)
hsl(345, 57%, 47%)
#ED1B35
rgb(237, 27, 53)
hsl(353, 85%, 52%)

Dark colour palette
scale_colour_ussc('dark')
scale_fill_ussc('dark')

#1C396E
rgb(28, 57, 110)
hsl(219, 59%, 27%)
#453362
rgb(69, 51, 98)
hsl(263, 32%, 29%)
#6F2D57
rgb(111, 45, 87)
hsl(322, 42%, 31%)
#99274B
rgb(153, 39, 75)
hsl(341, 59%, 38%)
#C32040
rgb(195, 32, 64)
hsl(348, 72%, 45%)
#ED1B35
rgb(237, 27, 53)
hsl(353, 85%, 52%)

Blue colour palette
scale_colour_ussc('blue')
scale_fill_ussc('blue')

#009DE3
rgb(0, 157, 227)
hsl(199, 100%, 45%)
#0589CB
rgb(5, 137, 203)
hsl(200, 95%, 41%)
#0B75B4
rgb(11, 117, 180)
hsl(202, 88%, 37%)
#10619C
rgb(16, 97, 156)
hsl(205, 81%, 34%)
#164D85
rgb(22, 77, 133)
hsl(210, 72%, 30%)
#1C396E
rgb(28, 57, 110)
hsl(219, 59%, 27%)

Greyscale colour palette
scale_colour_ussc('grey')
scale_fill_ussc('grey')

#CCCCCC
rgb(204, 204, 204)
hsl(0, 0%, 80%)
#B2B2B2
rgb(178, 178, 178)
hsl(0, 0%, 70%)
#989898
rgb(152, 152, 152)
hsl(0, 0%, 60%)
#6F6F6F
rgb(111, 111, 111
hsl(0, 0%, 44%)
#383838
rgb(56, 56, 56)
hsl(0, 0%, 22%)
#000000
rgb(0, 0, 0)
hsl(0, 0%, 0%)

General data visualization tips

Avoid plotting graphs with a dual y-axis. Humans naturally draw their eyes to the difference between the two lines, even when there is no comparison to be made. This is especially egregious when a “smaller” value is greater than the “larger” value because they’re on different scales. More on this from the team at datawrapper, as well as a few pieces from notable statisticians and social scientists. Here’s an academic study detailing why dual y-axis are not ideal.3

Erase non-data ink, within reason. We want the data to shine!

People use statistics and graphs to distort data in misleading ways. How do they do this? By deliberately creating misleading graphs. Because many lack visualisation literacy, this is, quite frankly, a good tactic if you want to convince someone that you’re right above all else. Data, statistics, and graphs all add credence your argument and many, especially those who don’t know statistics, are reluctant to critique quantitative evidence because a) they simply don’t know how to and b) the assumption is made that the author is more statistically knowledgeable than they are because, well, the author made the graph and presumably knows the data far better than the reader.

On the other hand, a major contributing factor to misleading graphs is the innumeracy of data viz creators themselves. The proliferation of tools like Excel, graphic design programs, and other data viz packages lead anyone to think they can create a graph (just like anyone can write the next great American novel!). However, a lot of these people lack basic statistical skills which lead to obvious mistakes. Tufte and others have written on this extensively (E. Tufte and Graves-Morris 1983, p79-87). Indeed, Tufte wrote, “Lurking behind the inept graphic is a lack of judgement about quantitative evidence. … Illustrators too often see their work as an exclusively artistic entreprise. … Those who get ahead are those who beautify data, never mind statistical integrity.” (1983, p79). Still, some of the worst graphs have come from scientists, engineers, computer scientists or economists. It means that there are plenty of examples of truly shocking data viz out there but we should hold ourselves to a higher standard.

Nathan Yau (a prolific statistician) wrote about misleading viz here. Michael Correll, a research scientist at Tableau, and Jeffrey Heer, a professor of data visualization and human computer interaction, recently wrote a piece differentiation between malicious misleading visualizations and just plain incorrect visualizations here.

How do people distort data?

In The Visual Display of Quantitative Information4, Edward Tufte devotes a whole chapter to the idea of “graphical integrity”. He writes, “The main defense of the lying graphic is … ‘Well, at least it was approximately correct, we were just trying to show the general direction of change.’ … A second defense of the lying graphic is that although the design itself lies, the actual numbers are printed on the graphic for those picky folks who want to know the correct size of the effects displayed. … Few writers would work under such a modest standard of integrity, and graphic designers should not either.” (E. Tufte and Graves-Morris 1983, p76-77, emphasis my own.) On page 77, he concludes the chapter with some advice:

Graphical integrity is more likely to result if these six principles are followed:

The representation of numbers, as physicallly measured on the surface of the graphic itself, should be directly proportional to the numeric quantities represented.

Clear, detailed, and thorough labelling should be used to defeat graphical distortion and ambiguity. Write out explanations of the data on the graphic itself. Label important events in the data.

Show data variation, not design variation.

In time-series displays of money, deflated and standardized units of monetary measurment are nearly always better than nominal units.

The number of information-carrying (variable) dimensions depicted should not exceed the number of the dimensions in the data.

Graphics must not quote the data out of context.

For example …

The colours in a statistical graphic should cooperate with each other. The typical purpose of colour in a statistical graphic is to distinguish between different areas or symbols in the plot — to distinguish between different groups or between different levels of a variable. This means that there will typically be several colours, or a palette of colours, used within a plot and that those colours should be related to each other. (Zeileis, Hornik, and Murrell 2009, 2)

Examples:

Bar charts

Histograms

Line charts

Tip: avoid spaghetti graphs by faceting the data or by highlighting a subset of the data.

Scatter plots

Maps

Should you use a map to display your data?


Choropleth maps are often seen as problematic because geographic areas and population vary in size, so maps might mislead the viewer. Yet, people love choropleth maps. (They are pretty!) Remember to ask yourself whether you’re plotting the effect of said variable or if you’re just plotting the population density.

Pie charts

A table is nearly always better than a dumb pie chart; the only worse design than a pie chart is several of them, for then the viewer is asked to compare quantities located in spatial disarray both within and between pies. (E. Tufte and Graves-Morris 1983, 178).

References

Tufte, Edward R, Nora Hillman Goeler, and Richard Benson. 1990. Envisioning Information. Graphics press Cheshire, CT.

Tufte, Edward, and P Graves-Morris. 1983. The Visual Display of Quantitative Information.

Zeileis, Achim, Kurt Hornik, and Paul Murrell. 2009. “Escaping Rgbland: Selecting Colors for Statistical Graphics.” Computational Statistics & Data Analysis 53 (9). Elsevier: 3259–70.


  1. If you use R, feel free to check out this ussc ggplot2 guide.

  2. I plan to fix this problem in the near future…

  3. It’s actually impossible to create dual axis charts in several data viz packages including data wrapper and ggplot2- Hadley was quite clear about the reason why he hasn’t given users the option do so.

  4. Simon has a copy or it can be found in Sydney Uni Library. It is a valuable, insightful (dare I say necessary?) read.